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HAPLOTYPE ANALYSIS 

. CROSS-REFERENCE TO RELATED APPLICATIONS 
[001] The present application claims benefit of the U.S. provisional application Serial 
No. 60/441,046, filed on January 17; 2003, which is herein incorporated by reference in its 
entirety. 

BACKGROUND OF THE INVENTION 
[002] Genetic polymorphisms are well recognized mechanisms underlying inter- 
individual differences in disease risk as well as treatment response in humans (Evans and 
Relling (1999) Science 286:487-491; Shields and Harris (2000) J. Clin. One. 18:2309-2316): 
Single nucleotide polymorphism (SNP) analysis has drawn much attention with the hope of 
identifying genetic markers for and genes involved in common diseases because of the 
frequency of the SNPs. Also, for many genes, the detection of SNPs known to confer loss of 
function provides a simple molecular diagnostic to select optimal medications and dosages 
for individual patients (Evans and Relling (1999) Science 286:487-491). It is common for 
genes to contain multiple SNPs, with haplotype structure being the principal determinant of 
phenotypic consequences (Collins et al. (1997) Science 278, 1580-81; Drysdale et al. (2000) 
Proc. Natl. Acad. Sci. 97:10483-8; Krynetski and Evans (1998) Am. J. Hum. Gen. 63:1 1-16). 
Therefore, to more accurately associate disease risks and pharmacogenomic traits with 
genetic polymorphisms, reliable methods are needed to unambiguously determine haplotype 
structure for multiple SNPs or other nucleic acid polymorphisms or mutations within genes as 
well as non-coding genomic regions. 

[003] However, current genotyping technologies are only able to determine each 
polymorphism, including SNPs, separately. In other words, there is a lack of information on 
how several polymorphisms are associated with each other physically on a chromosome. A 
DNA haplotype, the phase determined association of several polymorphic markers (e.g., 
SNPs) is a statistically much more powerful method for disease association studies. Yet 
unfortunately, it is also much harder to determine a haplotype. Current experimental 
approaches include a physical separation of homologous chromosomes via means of mouse 
cell line hybrid, cloning into a plasmid and allele specific PCR. Neither of them is simple 
enough a method for routine high-throughput analysis. There are also ways to 
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computationally determine haplotypes, but the accuracy of such computational analysis is 
uncertain! 

[004] Approaches that can be used to haplotype SNPs. or other nucleic acid 
polymorphisms, modifications and/or mutations that reside within relatively close proximity 
include, but are not limited to, single-strand conformational polymorphism (SSCP) analysis 
(Orita et al. (1989) Proc. Natl. Acad. Sci. USA 86:2766-2770), heteroduplex analysis (Prior 
et al. (1995) Hum. Mutat. 5:263-268), oligonucleotide ligation (Nickerson et al. (1990) Proc. 
Natl. Acad. Sci. USA 87:8923-8927) and hybridization assays (Conner et al. (1983) Proc. 
Natl. Acad. Sci. USA 80:278-282). A major drawback to these procedures is that they are 
limited to polymorphism detection along short segments of DNA and typically require 
stringent reaction conditions and/or labeling. Traditional Taq polymerase PCR-based 
strategies, such as PCR-RPLP, allele-specific amplification (ASA) (Ruano and Kidd (1989) 
Nucleic Acids Res. 17:8392), single-molecule dilution (SMD) (Ruano et al. (1990) Proc. 
Natl. Acad. Sci. USA 87:6296-6300), and coupled amplification and sequencing (CAS) 
(Ruano and Kidd (1991) Nucleic Acids Res. 19:6877-6882), are easily performed and highly 
sensitive, but these methods are also limited to haplotyping SNPs along short DNA segments 
(<1 kb) (Michalatos-Beloin et al. (1996) Nucleic Acids Res. 24:4841-4843; Barnes (1994) 
Proc. Natl. Acad. Sci. USA 91:5695-5699; Ruano and Kidd (1991) Nucleic Acids Res. 
19:6877-6882). 

[005] Long-range PCR (LR-PCR) offers the potential to haplotype SNPs that are 
separated by kilobase lengths of genomic DNA. LR-PCR products are commonly genotyped 
for SNPs, and haplotypes inferred using mathematical approaches (e.g., Clark's algorithm 
(Clark (1990) Mol. Biol. Evol. 7:1 1 1-122). However, inferring haplotypes in this manner 
does not yield unambiguous haplotype assignment when individuals are heterozygous at two 
or more loci (Hodge et al. (1999) Nature Genet. 21 :360-361). Physically separating alleles 
by cloning, followed by sequencing, eliminates any ambiguity, but this method is laborious 
and expensive. Long-range allele-specific amplification negates both of these problems, but 
is limited to SNP-containing alleles that have heterozygous insertion/deletion anchors for 
PCR primers (Michalatos-Beloin et al. (1996) Nucleic Acids Res. 24:4841-4843). More 
complex technologies have also been used, such as monoallelic mutation analysis (MAMA) 
(Papadopoulos et al. (1995) Nature Genet. 1 1:99-102) and carbon nanotube probes (Woolley 
et al. (2000) Nature Biotech. 18:760-763), but these are either time consuming (MAMA), or 
require technology that is not widely available (nanotubes). U.S. Patent Application No. US 
2002/0081598 discloses a haplotying method which involves the use of PCR amplification 
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and DNA ligation to bring the polymorphic nucleic acid sites in a particular allele into close 
proximity to facilitate the determination of haplotypes spanning kilobase distances. 
However, this method relies on at least two enzymatic steps to create DNA fragments that 
can be ligated with other DNA fragments, and subsequently ligases to combine the DNA 
fragments to form one large fragment with several polymorphic sites in a shorter distance. 
These additional sample preparation steps make large scale use and automation of this 
technique cumbersome and error prone. 

[006] Haplotypes, combinations of several phase-determined polymorphic markers in a 
chromosome, are extremely valuable for studies like disease association 1,2 and chromosome 
evolution. Direct molecular haplotyping has relied heavily on family data, but is limited to 
short genomic regions (a few kilobases). Statistical estimation of haplotype frequencies can 
be inconclusive and inaccurate 3 . 

[007] With the rapid discovery and validation of several million single nucleotide 
polymorphisms (SNP), it is now increasingly practical to use genome-wide scanning to find 
genes associated with common diseases 1>2 . However, individual SNPs have limited 
statistical power for locating disease susceptibility genes. Haplotypes can provide additional 
statistical power in the mapping of disease genes 4 " 7 . 

[008] Haplotype determination of several markers for a diploid cell is complicated since 
conventional genotyping techniques cannot determine the phages of several different markers. 
For example, a genomic region with three heterozygous markers can yield 8 possible 
haplotypes. This ambiguity can, in some cases, be solved if pedigree genotypes are available. 
However, even for a haplotype of only 3 markers, genotypes of father-mother-offspring trios 
can fail to yield offspring haplotypes up to 24% of the time. Computational algorithms such 
as expectation-maximization (EM), subtraction and PHASE are used for statistical estimation 
of haplotypes 4 ' 8,9 . However, these computational methods have serious limitations in 
accuracy, number of markers and genomic DNA length. For example, for a haplotype of 
only 3 markers from doubly heterozygous individuals, the error rates of the EM and PHASE 

• 3 

methods for haplotype reconstruction can be as high as 27% and 19%, respectively . 
Alternatively, direct molecular haplotyping can be used based on the physical separation of 
two homologous genomic DNAs prior to genotyping. DNA cloning, somatic cell hybrid 
construction, allele specific PCR and single molecule PGR 1(M2 have been used, and these 
approaches are largely independent of pedigree information. These methods are limited to 
short genomic regions (allele-specific PCR and single molecule PCR) and are prone to errors. 
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[009] Therefore, a simple and more reliable method, which is also suitable for large 
scale and automated haplotype determination of several polymorphic alleles separated by 
several kilobase distances is needed to facilitate the analysis of haplotype structure in 
pharmacogenomic, disease pathogenesis, and molecular epidemiological studies. 

SUMMARY OF THE INVENTION 
[010] The present invention provides an efficient way for high throughput haplotype 
analysis. Several polymorphic nucleic acid markers, such as- SNPs, can be simultaneously 
and reliably determined through multiplex PCR of single nucleic acid molecules in several 
parallel single molecule dilutions and the consequent statistical analysis of the results from 
these parallel single molecule multiplex PCR reactions results in reliable determination of 
haplotypes present in the subject. The nucleic acid markers can be of any distance to each 
other on the chromosome. In addition, an approach wherein overlapping DNA markers are 
analyzed can be used to link smaller haplotypes into larger haplotypes. Consequently, the 
invention provides a powerful new tool for diagnostic haplotyping and identifying novel 
haplotypes. 

[011] The method of the present invention enables direct molecular haplotyping of 
several polymorphic markers separated by several kilobases even spanning an entire 
chromosome. Distances of about 1, 2, 3, 4, 5-10, 15-20, kilobases (kb) or as far as about at 
least 25, 30, 35, 40, 45, or 50 kb or more are preferred. 

[012] Polymorphic nucleic acids useful according to the present invention include any 
polymorphic nucleic acids in any given nucleic acid region including, but not limited to, 
single nucleotide substitutions (single nucleotide polymorphisms or SNPs), multiple 
nucleotide substitutions, deletions, insertions, inversions, short tandem repeats including, for 
example, di-, tri-, and tetra-nucleotide repeats, and methylation and other polymorphic 
nucleic acid modification differences. Preferably the polymorphic nucleotides are SNPs. 
[013] A nucleic acid sample, preferably genomic nucleic acid sample from a subject 
organism is first diluted to a single copy dilution. The phrase "single copy dilution" refers to 
a dilution wherein substantially only one molecule of nucleic acid is present or wherein one 
or more copies of the same allele are present. When the molecular mass of the nucleic acid is 
known, a dilution resulting in one single molecule dilution can be readily calculated by a 
skilled artisan. For example, for human genomic DNA, about 3 pg of DNA represents about 
one molecule. Due to stochastic fluctuation in very dilute DNA solutions, the diluted sample 
may have no template nucleic acid molecules or it may have two or more molecules. If no 
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molecules are present in the sample, PCR amplification will not be achieved and the result 
will be "no genotype". If two or more molecules are present in the sample, the resulting 
amplification products may either be a mixture of two different alleles or represent one allele 
and consequently either a mixed genotype or a single allele genotype, respectively, is 
obtained. 

[014] To obtain statistical weight to accurately determine the haplotype comprising at 
least two markers, more than one replica of dilutions will be needed. For example, a replicate 
of four independent multiplex genotyping assays using about 3-4.5 pg of human genomic 
DNA, including the steps of diluting the nucleic acid sample, amplifying the diluted sample, 
and genotyping the amplified sample, enables about 90% of direct haplotyping efficiency. 
Therefore, preferably at least about 4-25, more preferably at least about 6-20, 8-20, 10-18, 
12-18 and most preferably about 10-12 replicates of same sample are included in the analysis 
according to the present invention, one replica including the steps of diluting the isolated 
nucleic acid sample from a subject organism, multiplex amplification of the diluted sample 
and genotyping the polymorphic nucleic acid sites from the amplified sample. 
[015] After the step of diluting the nucleic acid sample into a substantially single nucleic 
acid dilution, the regions containing the polymorphic sites of interest in the nucleic acid are 
amplified, using, for example polymerase chain reaction (PCR) and at least two, preferably 
more than two primer pairs flanking at least two different polymorphic nucleic acid sites in 
the target molecule. The primers are selected so that they amplify a fragment of at least about 
50 base pairs (bp), more preferably at least about 100, 200, 300, 400, 500, 600-1000 bp and 
up to about 10000 bp, wherein the fragment contains at least one polymorphic nucleotide site. 
Most preferably, the primer pairs are designed so that the amplification products are about 
90-350 bp long, still more preferablyabout 100-250 bp long . It is preferable to maximize the 
efficiency of amplification from the single molecule template and therefore, at least with the 
current technology, the shorter fragments are preferred. However, it will be self evident to a 
skilled artisan that the nucleic acid amplification techniques are constantly developing and 
the efficiency of amplifying longer nucleic acid fragments using very small quantities of 
template can be perfected and consequently, primers amplifying long fragments, even longer 
that those indicated above, may also be used according to the present invention. 
[016] After the amplification of the single molecule template with at least two different 
primer pairs, preferably at least 3, 4, 5, 6, 7, 8, 9, 10, primer pairs are used in a multiplex 
amplification reaction, the amplification product is subjected to genotyping. Use of up to at 
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least about 15, 20, 30, 40, 50 or more primer pairs in one multiplex reaction is preferred on 
one embodiment of the invention. 

1017] Genotyping can be performed by any means known to one skilled in the art 
including, for example, restriction fragment length polymorphism (RFLP) analysis using 
restriction enzymes, single-strand conformational polymorphism (SSCP) analysis, 
heteroduplex analysis, chemical cleavage analysis, oligonucleotide ligation and hybridization 
assays, allele-specific amplification, solid-phase minisequencing, or MASSARRAY™ 
system. 

[018] The haplotype is subsequently determined by analyzing replicas of at least four 
dilution/amplification/genotyping reactions so as to allow statistically accurate determination 
of the correct haplotype in the subject. The steps including dilution, amplification and 
genotyping from the same subject organism sample are repeated several times to obtain a data 
set which can be statistically analyzed to reveal the correct haplotype in the subject 
organism's sample. The approach does not rely on pedigree data and does not require prior 
amplification of the genomic region containing the selected markers thereby simplifying the 
analysis and allowing speedy and automated haplotyping. 

[019] In one embodiment, the invention is drawn to methods for determining a novel 
haplotype of nucleic acid segments, particularly of genes or other contiguous nucleic acid 
segments comprising at least two, preferably at least 3, 4, 5, 6, 7, 8, 9, 10-15, 20, 30, 40, 50- 
100 or even more distantly spaced nucleic acid polymorphisms. 

[020] The methods of the present invention are useful in medicine in determining the 
differences in disease risk or susceptibility and determining treatment response between 
individual patients. The methods, however, are not limited to applications in medicine and 
can be used to determine the haplotype structure of a particular gene, or other contiguous 
DNA segment, within an organism having at least two distally spaced nucleotide 
polymorphisms. Thus, the methods of the invention find further use in the field of 
agriculture, particularly in the breeding of improved livestock and crop plants. 
[021] In one embodiment, the invention provides a method of determining a haplotype 
in a sample obtained from an organism and comparing it to known haplotypes to diagnose a 
disease or disease susceptibility of an organism comprising the steps of identifying at least 
two polymorphic markers within a genomic region; isolating a nucleic acid sample from the 
subject organism and preferably purifying the isolated nucleic acid; diluting the nucleic acid 
sample into substantially single molecule dilution; amplifying the diluted nucleic acid sample 
with at least two primer pairs each capable of amplifying a different region flanking each of 
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the polymorphic sites in a multiplex PGR reaction; genotyping the polymorphic sites from the 
amplified sample; producing at least three additional genotype replicas from the nucleic acid 
sample of the subject organism as described above to allow statistically accurate 
determination of the haplotype in the subject organism sample. In a preferred method the 
genotyping is performed using primer extension, terminator nucleotides and matrix-assisted 
laser desorption/ionization time-of-flight mass spectrometry MALDI-TOF MS analysis. The 
haplotype is thereafter compared to an existing haplotype collection such as a haplotype 
database comprising disease- or disease susceptibility-associated haplotypes, or haplotypes 
associated with treatment responsiveness or unresponsiveness of the specific polymorphic 
markers. An non-limiting example of an existing haplotype database is a Y-STR Haplotype 
Reference Database which can be found at http://ystr.charite.de/index _gr.html. 
[022] For example, the Rl 17H mutation in the cystic fibrosis transmembrane receptor 
(CFTR) gene shows mild effect without the 5T mutation, and severe effect when the 5T 
mutation is present on the same chromosome. Thus, a haplotype of Rl 17H-5T is important 
for clinical application to determine the severity of the prognosis of this type of cystic 
fibrosis. The method of the present invention allows direct determination of the haplotypes 
with no requirement for patient pedigree genotype information, i.e. information of the 
genotypes from the patient's family members. The same approach can be applied in other 
genetic diseases where, for example, a second mutation on the same chromosome can change 
the disease manifestation from the first mutation. 

[023] The invention further provides a method wherein two haplotypes comprising 
several different polymorphic markers can be combined to form a larger haplotype covering a 
larger genomic region. This can be achieved by using one or more primer pairs to amplify 
one common polymorphic marker in two parallel multiplex amplification reactions after first 
diluting the sample as described above. The genotyping is performed as described above and 
the overlapping marker(s) provide a means to combine the two smaller haplotypes into one 
larger large haplotype comprising all the markers analyzed in both of the two different 
multiplex amplification reactions. 

[024] In one embodiment, the present invention provides a method for constructing a 
database of haplotypes associated with one or more disease or biological trait using the 
methods described above. Such haplotype databases are useful for diagnostic and prognostic 
applications. A haplotype obtained from a subject organism suspected can be compared 
against the haplotype database and allows diagnosis and/or prognosis of a condition of 
interest. A condition may be a disease condition or a biochemical or other biological trait 
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which is associated, for example, in responsiveness to a particular treatment or 
pharmaceutical and is determinative of choosing a treatment regime that, for example, a 
human patient would be responsive to. , 

[025] In one embodiment, the polymorphism is a nucleic acid modification, such as a 
methylation difference. For example, in one embodiment, the present invention provides a 
method of determining haplotypes comprised of markers including methylation differences. 
The DNA sample can be treated with any composition, for example, inorganic or organic 
compounds, enzymes, etc., that differentially affects the' modified, for example, methylated, 
nucleotide to effectively create polymorphisms based on methylation states. For example, 
DNA sample is treated with bisulfite (Frommer, M., L. E. McDonald, D. S. Millar, C. M. 
Collis, F. Watt, G. W. Grigg, P. L. Molloy, and C. L. Paul. 1992. A genomic sequencing 
protocol that yields a positive display of 5-methylcytosine residues in individual DNA 
strands. Proc. Natl. Acad . Sci U.S.A. 89:1827-1831) so that unmethylated cytosine residues 
are converted into uracil while methylated cytosines remain the same, thus effectively 
creating polymorphisms based on methylation states. Haplotypes consisting polymorphisms 
in the DNA region next to the methylation region and the methylation region itself can be 
determined in a similar fashion as described above. Bisulfite treated DNA is diluted to 
approximately single copy, amplified by multiplex PCR (each PCR specific for each 
polymorphism), and genotyped by the MassARRAY system. 

[026] The methylation detection procedure as described above is repeated at least 3, 4, 
5, 6, 7, 8, 9, 10-15, 15-20, 30, 40, 50 or more times, preferably about 12-18 times so as to 
allow statistical analysis of the correct methylation haplotype in the subject organism. 
[027] In the preferred embodiment, the methods of the present invention use mass 
spectrometry, for example, MASSARRAY™ system, to genotype the samples. 
[028] Therefore in one embodiment, the present invention provides a method for 
determining a haplotype of a subject comprising the steps of diluting a nucleic acid sample 
from the subject into a single molecule dilution; amplifying the diluted single nucleotide 
dilution with at least two different primer pairs designed to amplify a region comprising at 
least two polymorphic sites in the nucleic acid template; genotyping the polymorphic sites in 
the single nucleic acid molecule; and determining the haplotype from the genotypes of at 
least the two polymorphic sites to obtain a haplotype for the subject. 
[029] In one embodiment, the steps of diluting, amplifying and genotyping the nucleic 
acid sample from the subject are repeated at least three times from the same nucleic acid 
sample to obtain at least four genotype replicas from the same subject and thereafter 
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comparing the at least four genotype replicas to determine the haplotype. Preferably, at least 
4, 5, 6, 7, 8-10, 10-15, 15-20, 30, 50, 50-100 or more genotype replicas are obtained In one 
embodiment about 12-18 replicas are obtained and the results are analyzed statistically, using 
for example a method of Poisson distribution. 

[030] In one embodiment, the method further comprises comparing the haplotype with a 
haplotype from a control or a database of haplotypes from controls to determine association 
of the haplotype with a biological trait, which can be any biological trait including but not 
limited to various diseases. 

[031] The polymorphisms useful according to the present invention include, but are not 
limited to single nucleotide polymorphisms (SNPs), deletions, insertions, substitutions or 
inversions. The polymorphisms may also be a combination of one or more markers selected 
from the group consisting of a single nucleotide polymorphism, deletion, an insertion, a 
substitution or an inversion or other types of nucleic acid polymorphisms. 
[032] In one embodiment, the genotyping step of the method described above is 
performed using primer extension, preferably MASSARRAY™ technology, and mass 
spectrometric detection, preferably MALDI-TOF mass spectrometry. 
[033] In another embodiment, the invention provides a method of diagnosing a disease 
condition or disease susceptibility by determining a disease related haplotype in a subject 
comprising the steps of diluting a nucleic acid sample from the subject into a single molecule 
dilution; amplifying the diluted single nucleotide dilution with at least two primer pairs 
designed to amplify a region comprising at least two polymorphic sites in the nucleic acid 
template; genotyping the polymorphic sites in the single nucleic acid molecule; determining 
the haplotype from the genotype of at least two polymorphic sites, to obtain a haplotype for 
the subject; and comparing the haplotype of the subject to known disease-associated 
haplotypes wherein a match in the sample haplotype with a disease-associated haplotype 
indicates that the subject has the disease or that the subject is susceptible for the disease. 
[034] In one embodiment, the method further comprises repeating the dilution, 
amplification and genotyping steps at least three times from the same nucleic acid sample to 
obtain at least four genotype replicas from the same subject and thereafter comparing the at 
least four genotype replicas to determine the haplotype. Preferably at least 4, 5, 6, 7, 8, 9, 10 
15, 15-20, 25, 30, 40, 50-100 or more genotype replicas are produced. In one embodiment, 
about 12-18 replicas are produced. 

[035] The invention also provides a method of determining a haplotype of a subject 
comprising the steps of treating a nucleic acid sample from the subject with a composition 
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that differentially affects an epigenetically modified nucleotide in the nucleic acid sample to 
effectively create polymorphisms based on the epigenetic modification; diluting the treated 
nucleic acid sample into a single copy dilution; amplifying the diluted nucleic acid sample 
using at least two different primer pairs; genotyping the amplified sample; and determining 
the haplotype of the subject from the genotyped sample. The terms "epigenetic" modification 
or "epigenetically" modified nucleotides as described herein means nucleic acids that are 
modified by methylation, acetylation, or other epigenetic manner, i.e. by addition or deletion 
of a chemical or molecular structure on the nucleic acid which addition or deletion has an 
effect on the phenotype of the subject by altering the function of the modified nucleic acid. 
[036] In one embodiment, the method further comprises repeating the steps of dilution, 
amplification and genotyping at least three times to obtain at least four genotype replicas 
from the same subject and thereafter determining a haplotype of the subject based on the 
genotype replicas. In a preferred embodiment, at least 4, 5, 6, 7, 8, 9, 10-15, 15-20, 25, 30, 
40, 50-100, or more replicas are produced. In one preferred embodiment, about 12-18 
replicas are produced. The method of claim 13, wherein 12-18 replicas are produced. 
[037] In one embodiment, the epigenetic modification is methylation. 
[038] In yet another embodiment, the epigenetic modification is methylation and the 
composition that is used to treat the nucleic acid is bisulfite. 

[039] In another embodiment, the invention provides a method of determining a 
haplotype in a subject comprising the steps of: digesting a nucleic acid sample from the 
subject with a methylation-sensitive restriction enzyme so that either unmethylated DNA or 
methylated DNA is left intact, depending on which enzyme is used; diluting the digested 
nucleic acid sample to a single molecule concentration; amplifying the diluted nucleic acid 
sample with at least two different primer pairs; genotyping the amplified sample; and 
determining a haplotype of a methylated nucleic acid wherein at least two polymorphic 
markers next to the methylation site, together with the methylation site, constitutes a 
haplotype. 

[040] In one embodiment, the methylation sensitive enzyme is Hpall. 

[041] In one embodiment, the method further comprises repeating the steps of diluting, 
amplifying and genotyping at least three times to obtain at least four genotype replicas from 
the same subject and thereafter determining a haplotype of the subject based on the genotype 
replicas. Preferably at least 4, 5, 6, 7, 8, 9, 10-15, 4, 5, 6, 7, 8, 9, 10-15, 15-20, 25, 30, 40, 
50-100, or more replicas are produced. In one preferred embodiment, about 12-1 8 replicas 
are produced. The method of claim 13, wherein 12-18 replicas are produced. 
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BRIEF DESCRIPTION OF FIGURES 
[042] Figures 1 A- 1 B show a flow chart of multiplex genotyping of single DNA 
molecules for haplotype analysis using single nucleotide polymorphisms (SNPs) as markers. 
Traditional genotyping methods using a few nano-grams (ng) genomic DNA (about 1600 
copies of genomic templates) yield only the genotypes of each individual SNP marker, but 
the phases of these SNPs are not determined (shown in top right in the mass spectra in Fig. 
1A). Simultaneous genotyping of several markers using multiplex assays with single DNA 
molecules (Fig. IB) allows haplotyping analysis since the two alleles can be physically 
separated with very dilute DNA concentrations, shown in bottom right in the mass spectra in 
Fig. IB. In contrast to other molecular haplotyping methods, the entire haplotype block does 
not have to be amplified in this approach. Instead, only about 100 bp around each individual 
SNP is amplified for genotyping, resulting in very high efficiency of PCR amplification from 
single DNA molecules. The SNP markers can be as far apart as desired, as long as there is no 
significant break between them. 

[043] Figure 2 shows effects of genomic DNA concentration on haplotyping efficiency. 
About 3 pg, 5 pg and 9 pg (or 1, 1.6 and 3 copies of human genomic templates, respectively) 
were used for haplotyping of three SNP markers in the CETP region. The DNA copy number 
in a specific reaction was estimated by the Poisson distribution. The haplotyping result can 
either be a failed assay, successful haplotyping, both alleles present (no phase determination 
for the markers), or an incomplete multiplex. Except for incomplete multiplexes, values are 
percentages from 54 to 144 individual multiplex assays (see specification and example for 
details on the calculation), followed by predicted values using the Poisson distribution. 
[044] Figure 3 shows overlapping multiplex genotyping assays with single DNA 
molecules. Seven SNP markers (A: rs289744, B: rs2228667, C: rs5882, D: rs5880, E: 
rs5881, F: rs291044, G: 2033254) from an 8kb genomic region of the CETP locus were 
chosen (details of these SNPs, their chromosome position and oligonucleotides used for 
genotyping are provided in Table 2). Two 5-plex genotyping assays were designed for these 
7 markers and the overlapping heterozygous SNPs were used to obtain the entire haplotype of 
7 SNP markers. Assays on individual 6 were used to demonstrate how this is carried out. 
Multiplex assay 1 determined the haplotype of 5 SNPs as AGAGT and CGGGC. Multiplex 
assay 2 determined the other haplotype of 5 SNPs as GGGCT and AGGTT. Then, the 
genotypes of the overlapping SNPs (SNP C, E, F) were used to combine the two 5-SNP 
haplotypes into a haplotype of 7 SNPs covering the entire region under investigation. 
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DETAILED DESCRIPTION OF THE INVENTION 
[045] The present invention provides a direct molecule haplotyping approach which is 
based upon a surprising discovery that a single molecule dilution of genomic DNA can be 
used for separation of two homologous genomic DNAs and that using repeated dilutions from 
the same subject organisms as a starting material for multiplex amplification of different 
nucleic acid markers, haplotypes of any subject organisms can be determined and are 
statistically accurate. The diluted, amplified sample is then genotyped using, for example, 
the MASSARRAY™ system (Fig. 1). Parallel genotyping of several different dilutions from 
the same subject results in statistically accurate haplotype determination in the subject 
organism. 

[046] The approach of the present invention differs significantly from previous single 
molecule PCR method in that the method of the present invention does not require the 
amplification of the complete genomic region containing the markers of interest; thus it is hot 
limited to only a few kb DNA. The method of the present invention achieves close to 100% 
genotype and haplotype success rates for single DNA molecules. Additionally, the multiplex 
genotyping assay approach enables direct haplotype determination without pedigree genotype 
information. High throughput haplotyping can easily be achieved by incorporating the 
method of the present invention with any commercially available genotyping systems, such 
as the MASSARRAY™ system. 

[047] In one embodiment, the invention provides a method of determining a haplotype 
of a subject comprising the steps of obtaining a nucleic acid, preferably a genomic DNA 
sample, diluting the nucleic acid sample into substantially a single molecule dilution, 
amplifying the nucleic acid sample with at least two primer pairs designed to amplify a 
genomic region containing a nucleic acid polymorphism on one chromosome and genotyping 
the amplified DNA. Repeating the steps from diluting the nucleic acid sample, at least 3 or 
more times and statistically analyzing the results, thereby determining the haplotype of the 
subject organisms. 

[048] The "subject" as used in the specification refers to any organism with at least 
diploid genome including, but not limited to worms, fish, insects, plants, murine and other 
mammals including domestic animals such as cows, horse, dogs, cats, and, most preferably 
humans. 

[049] The methods of the present invention are useful, for example, in diagnosing or 
determining a prognosis in a disease condition known to be associated with a specific 
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haplotype(s), to map a disease or other biological trait the cause of which is currently 
unknown to a defined chromosomal region using haplotypes in the linkage analysis, to 
determine novel haplotypes, to detect haplotype associations with responsiveness to 
pharmaceuticals. 

[050] Genomic DNA can be obtained or isolated from a subject using any method of 
DNA isolation known to one skilled in the art. Examples of DNA isolation methods can be 
found in general laboratory manuals, such as Sambrook and Russel, MOLECULAR 
CLONING: A LABORATORY MANUAL, 3rd Ed., Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, N.Y. (2001), the entirety of which is herein incorporated by reference 
[051] Polymorphic Markers and Oligonucleotides. The number of polymorphic nucleic 
acid useful according to the present invention is ever increasing. Currently, such markers are 
readily available from a variety of publicly accessible databases and new ones are constantly 
being added to the pool of available markers. Markers including restriction length 
polymorphisms, short tandem repeats such as di-, tri-, and tetra-nucleotide repeats as well as 
methylation status can be used as polymorphic markers according to the present invention. 
Such markers are well known to one skilled in the art and can be found in various 
publications and databases including, for example, ATCC short tandem repeat (STR) 
database at http://www.atcc.org/Cultures/str.cfin. 

[052] Particularly useful markers according to the present invention are single 
nucleotide polymorphisms (SNPs). Examples of useful SNP databases include, but are not 
limited to Human SNP Database at http://www-genome.wi.mit.edu/snp/human, NCBI dbSNP 
Home Page at http://www.ncbi.nlm.nih.gov/SNP, 

http://lifesciences.perkinelmer.coin/SNPDatabase/welcome.asp, Celera Human SNP database 
athttp://www.celera.com/genoinics/academic/home.cfin?ppage==cds&cpage=snps, the SNP 
Database of the Genome Analysis Group (GAN) at http://www-gan.iarc.fr/SNPdatabase.html, 

[053] A number of nucleic acid primers are already available to amplify DNA fragments 
containing the polymorphisms and their sequences can be obtained, for example, from the 
above-identified databases. Additional primers can also be designed, for example, using a 
method similar to that published by Vieux, E.F., Kwok, P-Y and Miller, R. D. in 
BioTechniques (June 2002) Vol. 32. Supplement: "SNPs: Discovery of Marker Disease, pp. 
28-32. Novel SNPs can also be identified using a method of MASS ARRAY™ Discovery-RT 
(SNP-Discovery) system by SEQUENOM Inc. (San Diego, CA). 

[054] A number of different nucleotide polymorphism genotyping methods useful 
according to the present invention are known to one skilled in the art. Methods such as 
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restriction length polymorphism analysis (RFLP), single-strand conformation polymorphism 
(SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), temperature gradient gel. 
electrophoresis (TGGE), chemical cleavage analysis, dirept sequencing of nucleic acids 
using labels including but not limited to fluorescent and radioactive labels: All these methods 
have been available or at least a decade and are well known to one skilled in the art. 
[055] SNP genotyping can be performed using a number of different techniques known 
to one skilled in the art. For example, SNP genotyping by MALDI-TOF mass spectrometry 
can performed using, for example, the Sequenom's mass spectrometry system, 
MASS ARRAY™. In this method, after multiplexed PCR has been performed using more 
than one primer pair, each flanking different SNPs, a minisequencing primer extension 
reaction is performed in a single well using chain terminator nucleotides. The size of reaction 
products is determined directly by MALDI-TOF mass spectrometry, yielding the genotype 
information. It should be possible based upon this teaching. Multiplexing permits 
determination of, for example, at least 2, 3, 4, and 5 SNPs in a single well of a, for example 
384 well plate. For example, at least 6, 7, 8, 9, 10-12-plex genotyping can be performed 
using the MASSARRAY™ system. The MASS ARRAY™ system, for example, can be 
used to increase the multiplexity level of the genotyping reactions to even higher, for 
example at least 12-15, 20, 30, 40, and 50-100 and even higher. 

[056] Alternatively, fragment analysis for SNP detection can be performed on batches of 
several samples on a capillary electrophoresis system, for example an ABI PRISM® 3100 
GENETIC ANALYZER (Applied Biosystems, Foster City, CA). For capillary 
electrophoretic analysis, the primers can be labeled using dyes, including, but not limited to 
FAM, HEX, NED, LIZ, ROX, TAMRA, PET and VIC, 

[057] Single SNP allelic discrimination can further be carried out using the ABI 
PRISM® 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA), 
which allows analysis of single nucleotide polymorphisms (SNPs) using the fluorogenic 5* 
nuclease assay. 

[058] Yet another available method useful according to the present invention is an 
Arrayed Primer Extension (APEX) which is a resequencing method for rapid identification of 
polymorphisms that combines the efficiency of an microarray-based assay (alternative to gel- 
based methods, see, e.g., U.S. Patent No. 6,153,379 and Shumaker et al. Hum. Mutat. 
7(4):346-354, 1996) with the Sanger nucleic acid sequencing method (Sanger et al., Proc. 
Natl. Acad. Sci. 74:5463-5467 (1977)). Generally, microarrays are microchips, for example 
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glass slides, containing thousands of DNA segments in an ordered array, witch allows the 
simultaneous analysis of thousands of genetic markers. 

[059] A yet another genotyping method useful according to the present invention is a 
solid-phase mini-sequencing technique, which is also based upon a primer extension reaction 
and can be used for genotyping of SNPs and can also be easily automated (U.S. Patent No. 
6,013,431, Suomalainen et al. Mol. Biotechnol. Jun; 15(2): 123-31, 2000). 
[060] In general, a primer extension reaction is a modified cycle sequencing reaction in 
which at least one dideoxynucleotide (terminator) is present and not all deoxynucleotides are 
present at any significant concentration. When a terminator is incorporated onto a DNA 
strand, no further extension can occur on that strand. In a standard cycle sequencing reaction, 
terminators are present only in small concentrations along with high concentrations of typical 
nucleotides. In the single base extension reactions for SNP assays, two or more fluorescently 
or radioactively labeled terminator nucleotides (corresponding to the two or more alleles 
present at the SNP to be typed) are used. 

[061] The steps of the method of the present invention include diluting the nucleic acid 
sample into single nucleotide dilution, amplifying the diluted sample, and genotyping the 
amplified sample. These steps are repeated at least 3 times, preferably at least 4, 5, 6, 7, 8, 9, 
10-15, 15-20, 20-25, or even 25-50 times. Preferably, the steps are repeated about 12-18 
times so that the results can be statistically analyzed. The Poisson distribution analysis is 
used to analyze the results using the methods known to one skilled in the art. The analysis is 
described in detail, for example in Stephens et al. Am J Hum Genet 46: 1 149-1 155, 1990. 

[062] Haplotype is defined as a combination of alleles or nucleic acid polymorphisms, 
such as SNPs of closely linked loci that are found in a single chromosome and which tend to 
be inherited together. Recombinations occur at different frequency in different parts of the 
genome and therefore, the length of the haplotypes vary throughout the chromosomal regions 
and chromosomes. For a specific gene segment, there are often many theoretically possible 
combinations of SNPs, and therefore there are many theoretically possible haplotypes. 

[063] Traditionally, information about gene flow in a pedigree has been used to 
reconstruct likely haplotypes for families and individuals. However, even if nucleic acid 
samples from all the family members were available, which is rarely the case, statistics-based 
haplotype analysis does frequently not reveal the correct phase, i.e. haplotype, of the markers. 
Additionally, collection of large sample materials from, for example human families, is time 
consuming and expensive. In one embodiment, the present invention provides a method 
wherein novel haplotypes are determined using either established or novel nucleic acid 
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polymorphisms. For example, novel SNPs are first identified using nucleic acid samples 
isolated from several subject organisms of the same species, each polymorphic SNP marker 
from a subject is then genotyped individually, for example using about 1-10 ng, preferably 
about 5 ng genomic DNA. The genomic DNA sample is then diluted into about 1 copy of 
genomic template per dilution. The haplotype is determined by determining the SNP's in a 
diluted sample, i.e., sample diluted into a substantially single molecule dilution. 
Alternatively, the sample can be genotyped first or in parallel for each maker using more 
concentrated nucleic acid solution. This can be used to verify or control the haplotype 
determination using the diluted sample replicas. 

[064] The genomic region to be haplotyped using the method of the present invention is 
preferably at least about 1, 2, 3, 4, 5, 6, 7, 8, or 9 kb, more preferably at least about 10 kb or 
more, at least about 15 kb or more, at least about 20 kb or more. In one embodiment, the size 
of the region containing the polymorphic nucleotides is at least about 25 kb or more, at least 
about 35 kb or more, at least about 40-45 kb, or 45-50 or even about 50-100kb or more. Most 
preferably the genomic region is about 25 kb ore more. 

[065] In determining the haplotypes, both the PCR and the genotyping reactions are 

i 

preferably "multiplexed" which term is meant to include combining at least two, preferably 
more than at least 3, 4, 5, 6, 7, 8, 9, 10-15, or 20-25 extension primers in the same reaction 
are used to identify, preferably at least about 3, 4, 5, 6, 7, 8, 9, 10-15, or 20-25 polymorphic 
nucleic acid regions in the same genotyping reaction. In one embodiment, at least 30 primer 
pairs or more are used. 

[066] In one embodiment, the polymorphism is at least one nucleic acid modification, 
such as a methylation difference. In one embodiment, the present invention provides a 
method of determining haplotypes comprised of markers including methylation differences. 
The method of haplotyping methylation differences according to the present invention 
comprises the steps of diluting a nucleic acid sample from a subject organism into two 
parallel substantially single molecule dilutions. The two dilutions are consequently subjected 
to a methylation detection assay, for example, an AFLP assay (see, e.g., Vos et al. Nucleic 
Acids Res 23: 4407-4414, 1995; Xu et al., Plant Molecular Biology Reporter 18: 361-368, 
2000). The assay described by Vos et al. and Xu et al is modified to perform according the 
method of present invention. 

[067] In short, two single molecule dilutions are digested in two parallel reactions with a 
mixture comprising a methylation sensitive enzyme and another enzyme, preferably a less 
frequent cutting restriction enzyme, wherein the less frequent cutting restriction enzyme in 
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both digestion reactions is the same and the methylation sensitive enzymes added to the two . 
parallel reactions differ in their capacity to digest methylated/non-methylated nucleic acids. 
For example, one dilution is digested with a combination of EcoRI and HpaH and the parallel 
dilution is treated digested with EcoRI and Mspl. The two digested samples are then ligated 
using an adapter-ligation solution as described in Vos et al. and Xu et al., and amplified in 
parallel reactions using at least two, preferably more than two primer pairs which are capable 
of recognizing the restriction enzyme recognition sites in the templates. In the above- 
described example, EcoRI and HpaH - Mspl primers are used. One of the primers is labeled 
so as to allow detection of the fragments from the digestions using, for example gel 
electrophoretic methods or mass spectrometric detection. 

[068] The methylation detection procedure as described above is repeated at least 3 
more times, preferably at least about 6-12 times so as to allow statistical analysis of the 
correct methylation haplotype in the subject organism. 

[069] In light of this disclosure, other nucleic acid modification detection technologies 
including methylation detection techniques may be readily adapted to be used according to 
the principle steps of the present invention including single molecule dilution, digestion, 
multiplex amplification and multiplex genotyping. Methylation detection methods may also 
be combined to detect both methylation and other polymorphic markers, such as SNPs. In 
such embodiment, the amplification after restriction enzyme digestion is performed not only 
with methylation specific primers but also with primers designed to amplify fragments 
containing known nucleic acid polymorphisms, such as SNPs. 

[070] In one embodiment, the invention provides a method of creating haplotypes of 
several polymorphic nucleotides using overlapping multiplex genotyping assays with single 
DNA molecules. For example, markers from a large genomic region are chosen and one or 
more separate multiplex amplification reactions are performed from single nucleotide 
dilutions and overlapping heterozygous polynucleotide markers are used to obtain the entire 
haplotype. 

[071] For example, Figure 3 shows seven SNP markers (A: rs289744, B: rs2228667, C: 
rs5882, D: rs5880, E: rs5881, F: rs291044, G: 2033254) from an 8kb genomic region of the 
CETP locus that were chosen to determine a haplotype. Details of these SNPs, their 
chromosome position and oligonucleotides used for genotyping are provided in Table 2. Two 
5-plex genotyping assays were designed for the 7 markers and the overlapping heterozygous 
SNPs were used to obtain the entire haplotype of 7 SNP markers. Assays on individual No. 6 
were used to demonstrate how this is carried out. Multiplex assay 1 determined the haplotype 
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of 5 SNPs as AGAGT and CGGGC. Multiplex assay 2 determined the other haplotype of 5 
SNPs as GGGCT and AGGTT. Then, the genotypes of the overlapping SNPs (SNP C, E, F) 
were used to combine the two 5-SNP haplotypes into a haplotype of 7 SNPs covering the 
entire region under investigation. 

EXAMPLE 

[072] The effects of genomic DNA concentration on haplotyping efficiency were 
determined as follows. We used 3 picograms (pg), 5 pg and 9 pg (equivalent of 1, 1 .6 and 3 
genomic template copies) of genomic DNA for PCR amplification and genotyping of 3 SNPs 
in the CETP region from 12 individuals. Each 3-plex assay was repeated 12-18 times to 
evaluate the PCR and haplotyping efficiency. A typical assay result is summarized in Table 
1 . The copy number of the genomic DNA region of interest for very dilute DNA solutions is 
estimated by the Poisson distribution 13 . Haplotyping results were categorized into 4 groups 
(Table 1). 

[073] Failed assays can result from either failed PCR amplification from single copy 
DNAs or simply no template present due to stochastic fluctuation of very dilute DNA 
solutions. 

[074] Partially failed genotyping calls (or incomplete multiplexes) are those that have 
only 1 or 2 SNPs successfully genotyped. This is most likely due to unsuccessful PCR for 1 
or 2 of the SNP DNA regions, since in most cases the 3 SNP markers are present or absent at 
the same time due to the close proximity of the SNP markers (< 628 bp). Poisson distribution 
may also result in the presence both alleles in the solution and hence the inability to resolve 
the phase of the SNPs. 

[075] Successful haplotyping analysis is achieved when a single copy of the allele or 
multiple copies of the same allele are present and the genotyping is successful. 
[076] Incomplete multiplex genotyping can be used to estimate the efficiency of 
genotyping from single copy DNA molecules. A partial genotyping call suggests the 
presence of the SNP DNA but a failure to genotype some of the SNPs. We typically 
observed 5-10% incomplete multiplex genotyping calls (Fig. 2), suggesting a PCR efficiency 
of about 90-95% with single DNA molecules. This approach may overestimate the PCR 
efficiency, since we did not take the completely failed assays into account. We also carried 
out detailed comparison between observed and theoretical values of failed assays, successful 
haplotyping and the presence of both alleles (Fig. 2 and see methods section for details of 
calculation). Theoretical values are based on the Poisson distribution of very dilute DNA 
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solutions and the assumption of 100% PCR amplification efficiency. The close agreement 
between theoretical estimate and experimental observation substantiates the earlier estimate 
of extremely high PCR efficiency with single DNA molecules. 

[077] High PCR efficiency is mainly due to the high efficiency of amplification of very, 
short amplicons (typically 100 bp) and the high sensitivity of MALDI-TOF mass 
spectrometric detection of DNA oligonucleotides. High PCR efficiency is preferred for high- 
throughput haplotyping analysis. For example, with our current PCR efficiency, we can 
achieve 40-45% haplotyping efficiency with one single reaction using 3-4.5 pg genomic 
DNA. A replicate of 4 independent multiplex genotyping assays will enable about 90% of 
direct haplotyping efficiency. 

[078] We next demonstrated an approach for determining haplotypes where there are 
too many markers to be determined in one multiplex genotyping assay. Overlapping 
informative SNPs were used to combine haplotypes from several multiplex assays. We chose 
six SNP markers in an 8kb CETP genomic region, and 2 overlapping 4-plex genotyping 
assays were used for haplotyping analysis (Fig. 3). We were able to determine the haplotypes 
of all 12 individuals for this genomic region, with absolutely no optimization of the assay 
system. 

[079] The approach presented here provides a powerful and unique technology platform 
for direct molecular haplotyping analysis of long-range genomic regions. This approach is 
completely independent of pedigree genotype information. 

[080] We have further incorporated this technique with the commercially available 
MASSARRAY™ system for high-throughput applications. This technology is extremely 
useful in large-scale haplotyping and haplotype-based diagnostics* 

Materials and Methods 
[081] Genomic DNAs and oligo nucleotides. Human genomic DNA samples used for 
haplotyping of the CETP locus were provided by SEQUENOM Inc. (San Diego, CA). These 
DNAs were isolated using the Puregene DNA isolation kit (Gentra Systems) from blood 
samples purchased from the Blood Bank (San Bernadino County, CA). The personal 
background of the blood donors is not accessible for these samples. Human genomic DNAs 
samples for haplotyping of a 25kb segment on chromosome 5q31 were CETP family DNAs 
purchased from Coriell Cell Repositories (see Table 3). Information on SNPs and 
oligonucleotides for genotyping is provided in Table 2. 

[082] Genotyping and haplotyping analysis. Genotyping analyses were carried out 
using the MassArray™ system (SEQUENOM Inc.). Each SNP from every individual was 
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first genotyped individually using 5 ng genomic DNA. For haplotyping analysis, multiplex 
genotypirig assays were carried out using 3 pg (or approximately 1 copy of genomic template, 
unless otherwise specified) genomic DNA. , 

[083] Analysis of effects of genomic DNA concentration on haplotyping. To calculate 
the percentage of failed assays, we simply counted all failed assays (no calls for either SNP), 
divided by the total number of assays. We typically do 12 to 18 replicates for each 6 or 12 
individuals. The percentage of incomplete assays is calculated in the same way. To calculate 
percentage of successful haplotyping and both alleles, we excluded the data from those 
individuals with homozygous haplotypes. Theoretical predictions are based on the Poisson 
distribution of very diluted DNA solutions, according to a published method 13 . 

Table 1 • Sample Haplotype analysis with triplex genotyping assay 3 



Repeat 


Genotype Calls 


1 


GGC b 


2 


GGC 


3 




4 


-GC d 


5 




6 


GGC 


7 




8 


ACA 


9 


-GC 


10 


• A/G C/G A/C e 


11 


ACA 


12 


ACA 



a Genotypes of 3 SNP markers were determined with triplex assays from 
3 pg genomic DNA. 

b The 3 SNPs are G, G, C genotype respectively. 
c Failed to genotype any of the 3 SNPs. 

d Failed to genotype the first SNP, the rest two SNPs are G and C 
respectively. 

e Failed to separate the two alleles, thus the genotypes are A/G, C/A and 
A/C for the 3 SNPs. 
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Table 3. DNA samples used in Hie Example. 



Repository 
Number 



Sample Type Sample Description 



Relation 



GM12S47 
GM12548 
GM12549 
GM12550 
GM12551 
GM12552 
GM12553 
GM12554 
GM12555 
GM12556 
GM12557 
GM12558 
GM12559 

GM07038 
GM06987 
GM07004 
GM07052 
GM06982 
GM07011 
GM07009 
GM07678 
GM07026 
GM07679 
GM07049 
GM07002 
GM07017 
GM07341 
GM11820 

GM07029 
GM07019 
GM07062 
GM07053 
GM07008 
GM07040 
GM07342 
GM07027 
GM06994 
GM07000 
GM07022 
GM07056 
GM11821 



Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 
Lymphoblast CEPH/FRENCH PEDIGREE 66 paternal grandfather 
Lymphoblast CEPH/FRENCH PEDIGREE 66 paternal grandmother 
Lymphoblast CEPH/FRENCH PEDIGREE 66 maternal grandfather 
Lymphoblast CEPH/FRENCH PEDIGREE 66 maternal grandmother 



father 
mother 

son 
daughter 
daughter 

son 
daughter 
daughter 

son 



Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 
Lymphoblast 



CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 
CEPH/UTAH PEDIGREE 1333 



Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 
Lymphoblast CEPH/UTAH PEDIGREE 1340 



father 
mother 

son 

son 

son 
daughter 

son 

son 

son 

son 

paternal grandfather 
paternal grandmother 

maternal grandfather 
maternal grandmother 
daughter 

father, 
mother 
daughter 
daughter 
son 
son 
son 
son 

paternal grandfather 
paternal grandmother 
maternal grandfather 
maternal grandmother 
son 



GM07349 Lymphoblast CEPH/UTAH PEDIGREE 1345 



father 
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GM07348 Lymphoblast CEPH/UTAH PEDIGREE 1345 mother 

GM07350 Lymphoblast CEPH/UTAH PEDIGREE 1345 daughter 

GM07351 Lymphoblast CEPH/UTAH PEDIGREE 1345 son 

GM07352 Lymphoblast CEPH/UTAH PEDIGREE 1345 son 

GM07353 Lymphoblast CEPH/UTAH PEDIGREE 1345 son 

GM07354 Lymphoblast CEPH/UTAH PEDIGREE 1345 daughter 

GM07355 Lymphoblast CEPH/UTAH PEDIGREE 1345 son 

GM07356 Lymphoblast CEPH/UTAH PEDIGREE 1345 son 



GM07347 Lymphoblast CEPH/UTAH PEDIGREE 1345 paternal grandfather 

GM07346 Lymphoblast CEPH/UTAH PEDIGREE 1345 paternal grandmother 

GM07357 Lymphoblast CEPH/UTAH PEDIGREE 1345 maternal grandfather 

GM07345- Lymphoblast CEPH/UTAH PEDIGREE 1345 maternal grandmother 
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